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Program Description^ 

Open Court Reading® is a reading program for grades K-6 pub- 
lished by McGraw-Hill Education that is designed to teach decoding, 
comprehension, inquiry, and writing in a three-part logical progres- 
sion. Part One of each unit. Preparing to Read, focuses on phonemic 
awareness, sounds and letters, phonics, fluency, and word knowl- 
edge. Part Two, Reading and Responding, emphasizes reading litera- 
ture for understanding, comprehension, inquiry, and practical reading 
applications. Part Three, Language Arts, focuses on writing, spelling, 
grammar, usage, mechanics, and basic computer skills. In 2007, 

McGraw-Hill Education revised Open Court Reading® and changed 
the name to imagine it!®. The studies featured In this report evaluate 
the use of Open Court Reading® in grades K-3. 

Research^ 

The What Works Clearinghouse (WWC) identified two studies of Open 
Court Reading® that both fall within the scope of the Beginning Reading 
topic area and meet WWC group design standards. Cne study meets 
WWC group design standards without reservations, and one study meets WWC group design standards with reserva- 
tions. Together, these studies included 1,113 beginning readers in grades 1-3 in six states. 
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The WWC considers the extent of evidence for Open Court Reading® on the reading skills of beginning readers 
to be small for two outcome domains— general reading achievement and comprehension. There were no studies 
that meet standards in the two other domains, so this intervention report does not report on the effectiveness of 
Open Court Reading® for those domains. (See the Effectiveness Summary on p. 4 for more details of effectiveness 
by domain.) 


Effectiveness 

Open Court Reading® was found to have potentially positive effects on general reading achievement and compre- 
hension for beginning readers. 


Table 1. Summary of findings^ 




Improvement index 

(percentile points) 





Outcome domain 

Rating of effectiveness 

Average 

Range 

Number of 
studies 

Number of 
students 

Extent of 
evidence 

General reading 
achievement 

Potentially positive effects 

+12 

na 

1 

434 

Small 

Comprehension 

Potentially positive effects 

+10 

na 

1 

679 

Small 


na = not applicable 
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Program Information 

Background 

Open Court Reading® is published by McGraw-Hill Education (formerly SRA/McGraw-Hill). The program was origi- 
nally developed in the 1960s and has undergone several revisions and updates over the years. In 2007, McGraw- 
Hill Education revised the program and changed the name to Imagine It!®. Address: McGraw-Hill Education, RO. 
Box 182605, Columbus, OH 43218. Website: https://www.mheonline.com/. Telephone: (800) 338-3987. 

Program details 

Open Court Reading® materials are divided by grade and include the Reading, Phonemic Awareness and Phonics Kit 
(K); Sounds and Letters Workbook (K); Language Arts Skills Workbook (K); Big Books and Little Books (K-1); Lan- 
guage Arts Big Book (K-1); Pre-Decodable and Decodable Texts (K-3); Part 1 Lesson Cards (K-3); Desk Strips (K-3); 
Unit Assessment (K-6); Transparencies (K-6); Writer’s Workbook (K-6); Challenge Workbooks (K-6); Reteach Work- 
books (K-6); Intervention Support (K-6); Phonics Skills Workbook (1); First and Second Readers (1-2); Reading and 
Phonics Package (1-3); Student Anthologies (1-6); Comprehension and Language Arts Workbook (1-6); Spelling and 
Vocabulary Skills Workbook (1-6); Inquiry Journal (2-6); and Language Arts Handbook (2-6). The Teacher’s Edition 
(K-6) contains information on providing systematic, explicit skills instruction centered on literature selections. Lesson 
plans indicate the goals and objectives for each lesson and provide detailed suggestions for implementation. 

Open Court Reading® was revised and renamed Imagine It!® in 2007. Program revisions include Increased Instruction 
In vocabulary, writing, and inquiry; stronger support for English learners; and enhanced technology components. 


Cost 

The Open Court Reading® curriculum Includes grade-specific student textbooks, workbooks, decodable books, 
and anthologies. Open Court Reading® Online Professional Development provides support for teachers, principals, 
reading specialists, and coaches. For details on specific product pricing, contact McGraw-Hill Education, the pro- 
gram publisher. 
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Research Summary 

The WWC identified 185 studies that investigated the effects of Open Table 2. Scope Of reviewed research 
Court Reading® on the reading skiils of beginning readers. 

The WWC reviewed 29 of those studies against group design stan- 
dards. One study (Borman, Dowiing, & Schneck, 2008) is a randomized 
controlied trial that meets WWC group design standards without reser- 
vations, and one study (Skindrud & Gersten, 2006) uses a quasi-exper- 
imentai design that meets WWC group design standards with reservations. Those two studies are summarized in 
this report. Twenty-seven studies do not meet WWC group design standards. One study does not meet WWC 
singie-case design standards. The remaining 155 studies do not meet WWC eligibility criteria for review in this topic 
area. Citations for ail 185 studies are in the References section, which begins on p. 6. 

Summary of study meeting WWC group design standards without reservations 

Borman et al. (2008) conducted a randomized controlled trial that examined the effects of Open Court Reading® 
on first- through fifth-grade students attending five schoois in five states during the 2005-06 school year. At each 
school, classrooms were randomly assigned within each grade either to implement Open Court Reading® or to 
serve as the comparison group. The WWC based its effectiveness rating on findings from 679 students from grades 
1-3 who participated in the study; 300 students in the Open Court Reading® group and 379 in the comparison 
group."* The study reported student outcomes after approximately 7 months of program implementation. 

Summary of study meeting WWC group design standards with reservations 

Skindrud and Gersten (2006) examined the effects of Open Court Reading® on second- through fourth-grade 
students attending 12 schools in the Sacramento City Unified School District during the 1997-98 and 1998-99 
school years. Four schools implementing Success for AH® were matched to eight schools that implemented Open 
Court Reading®. Schools were matched by poverty level as measured by the percent of students eligible for free 
or reduced-price meals and percent of students on Aid to Families with Dependent Children. The WWC based its 
effectiveness rating on findings from 434 students from grades 2-3 who participated in the study; 292 in the Open 
Court Reading® group and 142 in the comparison group.® The study reported student outcomes at two points in 
time: at the end of second grade and at the end of third grade. Findings at the end of third grade reflect maximum 
exposure to the intervention by students and were used to determine the rating of effectiveness.® 


Grade 

1, 2, 3 

Delivery method 

Whole class 

Program type 

Curriculum 
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Effectiveness Summary 

The WWC review of Open Court Reading® for the Beginning Reading topic area inciudes student outcomes in four 
domains: generai reading achievement, comprehension, aiphabetics, and reading fiuency. The two studies of Open 
Court Reading® that meet WWC group design standards reported findings in two of the four domains: (a) generai 
reading achievement and (b) comprehension. The findings beiow present the authors’ estimates and WWC-caicu- 
iated estimates of the size and statistical significance of the effects of Open Court Reading® on beginning readers. 
For a more detailed description of the rating of effectiveness and extent of evidence criteria, see the WWC Rating 
Criteria on p. 28. 

Summary of effectiveness for the general reading achievement domain 

One study that meets WWC group design standards with reservations reported findings in the generai reading 
achievement domain. 

Skindrud and Gersten (2006) found a statisticaily significant positive effect of Open Court Reading® on the Stanford 
Achievement Test, 9th Edition (SAT-9). However, a correction for ciustering was needed and resulted in a WWC- 
computed p-value of .30; therefore, the WWC does not find the resuit to be statisticaiiy significant. The WWC char- 
acterizes these study findings as a substantively important positive effect. 

Thus, for the general reading achievement domain, one study showed substantively important positive effects. This 
results in a rating of potentially positive effects, with a smail extent of evidence. 


Table 3. Rating of effectiveness and extent of evidence for the generai achievement domain 


Rating of effectiveness 

Criteria met 

Potentially positive effects 

Evidence of a positive effect with no 
overriding contrary evidence. 

In the one study that reported findings, the estimated impact of the intervention on outcomes in the general 
reading achievement domain was positive and substantively important. 

Extent of evidence 

Criteria met 

Small 

One study that included 434 students in 12 schools reported evidence of effectiveness in the general reading 
achievement domain. 


Summary of effectiveness for the comprehension domain 

One study that meets WWC group design standards without reservations reported findings in the comprehension 
domain. 

Borman et al. (2008) reported positive effects of Open Court Reading® on the Reading Composite score of the 
Comprehensive Test of Basic Skiils, 5th Edition (CTBS/5) Terra Nova test for students in grades 1 , 2, and 3. 
Aithough the authors did not calcuiate the statisticai significance of the effects for the sampie that aiigns with the 
WWC’s Beginning Reading protocol, consisting of students in grades 1 through 3 only, the average effect size was 
large enough to be considered substantively important according to WWC criteria (i.e., an effect size of at least .25). 
The WWC characterizes these study findings as a substantively important positive effect. 

Thus, for the comprehension domain, one study showed substantively important positive effects. This results In a 
rating of potentially positive effects, with a small extent of evidence. 
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Table 4. Rating of effectiveness and extent of evidence for the comprehension domain 


Rating of effectiveness 

Criteria met 

Potentially positive effects 

Evidence of a positive effect with no 
overriding contrary evidence. 

In the one study that reported tindings, the estimated impact ot the intervention on outcomes in the comprehension 
domain was positive and substantively important. 

Extent of evidence 

Criteria met 

Small 

One study that included 679 students in five schools reported evidence of effectiveness in the comprehension 
domain. 
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Appendix A.1: Research detaiis for Borman et ai. (2008) 

Borman, G. D., Dowling, N. M., & Schneck, C. (2008). A multi-site cluster randomized field trial of Open 
Court Reading. Educational Evaluation and Policy Analysis, 30(4), 389-407. 

Tabie AI. Summary of findings Meets WWC group standards without reservations 


study findings 
Average improvement index 

Outcome domain Sample size (percentile points) Statistically significant 


Outcome domain 

study findings 
Average improvement index 

Sample size (percentile points) Statistically significant 

Comprehension 

679 students +10 No 

Setting 

The study initially included six schools— one each in Florida, Georgia, Idaho, Indiana, North 
Carolina, and Texas. Two schools were from rural areas, two from suburban areas, and two 
from urban areas. The Georgia school dropped out of the study. 

Study sample 

McGraw-Hill Education recruited a group of schools that had not previously used Open Court 
Reading® to participate in the study. The six schools that initially participated were given free 
Open Court Reading® materials, as well as a teacher training program and implementation 
support. At each school, classrooms were randomly assigned within each grade either to be 
enrolled in Open Court Reading® or to serve as the comparison group. 

The entire study sample consisted of 57 grade 1-5 classrooms containing a total of 1 ,099 
students. The sample considered in this review, which aligns to the Beginning Reading review 
protocol, initially consisted of 44 grade 1-3 classrooms containing a total of 855 students. 
After attrition, the combined analysis sample consisted of 36 classrooms containing 679 
students in grades 1-3; 379 students in the 20 Open Court Reading® classrooms and 300 
students in 16 comparison classrooms. Of the participating students, more than 70% were 
minorities, and more than 75% were eligible for free or reduced-price lunches. Fewer than 
15% were English as Second Language (ESL) students, and fewer than 10% were special 
education students. 

Intervention 

group 

Open Court Reading® is a curriculum that includes textbooks, workbooks, decodable books, 
and anthologies. The curriculum consists of three main components: (a) Preparing to Read, (b) 
Reading and Responding, and (c) Language Arts. For this study, teachers were given a teach- 
er’s edition of the curriculum that included scripted direct instruction lessons and diagnostic 
and assessment packages. The program is designed to be used for 2.5 hours per day with 
grades 1-2 and for 2 hours per day with grades 4-6. However, the authors report that external 
consultants observed that some teachers provided only 90 minutes of daily instruction. The 
intervention was implemented from fall to spring during the 2005-06 school year. 

Comparison 

group 

The comparison classrooms used a “business-as-usual” curriculum and were instructed not to 
use Open Court Reading® or any of its materials. Principals mentioned that curricula currently 
in use by the comparison classrooms consisted of Reading Street by Scott Foresman, Literacy 
Place by Scholastic, McGraw-Hill Reading by MacMillan/McGraw-Hill, Collections by Har- 
court, and Trophies by Harcourt. Consultants visited comparison classrooms and verified that 
they were not using Open Court Reading®. 
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Outcomes and For both the pretest (October 2005) and the posttest (May 2006), students took the CTBS/5 

measurement Terra Nova Reading Comprehension and Vocabulary subtests. A Reading Composite score 

was also reported, which is the average of these two subtest measures. For a more detailed 
description of these outcome measures, see Appendix B. Findings for the combined student 
sample on the Reading Composite score can be found in Appendix C.2. Additional findings 
reflecting subtest outcomes separately for grades 1 , 2, and 3 can be found in Appendix D.2. 


Support for Teachers were provided training opportunities with external consultants, which consisted of 
impiGmontation 2- to 3-day summer workshops. In addition, the consultants, who had teaching experience 

and detailed knowledge of Open Court Reading® and were trained by McGraw-Hill Education, 
visited and observed classrooms, and provided feedback to teachers. 


Appendix A.2: Research details for Skindrud and Gersten (2006) 

Skindrud, K., & Gersten, R. (2006). An evaluation of two contrasting approaches for improving reading 
achievement in a large urban district. Elementary School Journal, 106(5), 389-407. 


Table A2. Summary of findings Meets WWC group design standards with reservations 




Study findings 

Outcome domain 

Sample size 

Average improvement index 

(percentile points) Statistically significant 

General reading achievement 

434 students 

+12 No 


Setting The study was conducted In 12 schools in the Sacramento City Unified School District 
(SCUSD), a large urban district in northern California. 

Study sample Under California’s interpretation of Reading First, all 59 elementary schools in SCUSD were 
required to implement one of two models of reading Instruction, Success for All (SFA)® or 
Open Court Reading®. In the fall of 1997, four schools implemented SFA®. A matched sample 
of Open Court Reading® schools were created by rank-ordering SCUSD schools by poverty 
level (measured by the percent of students eligible for free or reduced-price meals and per- 
cent of students on Aid to Families with Dependent Children), and selecting two comparison 
schools for each SFA® school— those ranked just above and just below each SFA® school. 

The study Included two cohorts of students: students in Cohort 1 began using the reading 
programs in grade 2, while students In Cohort 2 began in grade 3. A total of 936 students in 
Cohort 1 and Cohort 2 participated in the study, including students continually enrolled at 
study schools from fall 1997 to spring 1999 who completed all study tests and did not repeat 
a grade. The WWC based its effectiveness rating on findings from 434 Cohort 1 students who 
participated in the study; 292 in the Open Court Reading® group and 1 42 in the comparison 
group— these students were followed from second to third grade. Results for the Cohort 2 stu- 
dents are not included In this report because, based on information obtained from the authors, 
that sample of students was not equivalent on key characteristics at baseline. 
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Intervention 

group 


Comparison 

group 


Outcomes and 
measurement 


Support for 
impiementation 


Students in the intervention group received reading instruction using Open Court Reading®, a 
systematic approach to teaching alphabetics, print knowledge, and phonemic awareness. For 
this study, the district used the 1996 version of the curricula. Open Court Collections for Young 
Scholars. Two hours of daily whole-class reading instruction was followed by 30 minutes of 
small-group instruction and/or independent work. All study students received a condensed 
selection of instructional content to “catch-up” students to Open Court Reading® content that 
they had not received in prior years (since they began using the curriculum in either second or 
third grade). 

Students in the comparison group received reading instruction through SFA®. Students were 
put into homogeneous groups, across classrooms and grades, based on reading skills. They 
received 90 minutes of reading instruction daily, outside of their homerooms. SFA® also pre- 
scribes additional writing instruction outside of these groups. The SFA® training consultants 
monitored implementation fidelity and observed additional writing instruction in all study schools 
during both study years. The authors noted that teachers in SFA® schools frequently included 
additional spelling and grammar, along with writing instruction, outside of the 90-minute reading 
block. A core reading curriculum is only prescribed in grades K-1 ; in grades 2-6, the schools 
can choose their own reading curricula. The authors state that the materials and guidelines for 
instruction (Reading Roots for grade 1 , and Reading Wings for grades 2-4), as well as the pro- 
fessional development, tutoring, and the SFA® school facilitator and regional consultant over- 
sight procedures, all followed those outlined by the developers of the curriculum. 

The outcome measure was the Reading subtest from the SAT-9, administered in both spring 
1998 and spring 1999. The authors converted all measures to normal curve equivalent scores. 
For a more detailed description of this outcome measure, see Appendix B. The Language 
subtest from the SAT-9 was reported by the authors; however, this outcome measure is not 
included in this report because it is not an eligible outcome under the Beginning Reading 
evidence review protocol. The intermediate findings (after 1 year of implementation) for second 
graders are reported in Appendix D.1 . 

At Open Court Reading® schools, teachers received 4 days of basic grade-level training in 
year 1 , followed by 4 days of advanced grade-level training in year 2. Each Open Court Read- 
ing® school received a reading coach (either full-time or part-time, depending on school size). 
Curriculum experts met monthly with reading coaches and administrators to refine instruction 
and supervision and to solve problems. Reading coaches collected implementation informa- 
tion but were prohibited from sharing the information with the study authors; the district-level 
reading coordinator indicated that although some schools had implementation problems at the 
beginning of the study, these were resolved by the second study year. 

At SFA® schools, training and technical assistance was provided by SFA® consultants from a 
regional SFA® office. The SFA® consultants assessed implementation fidelity and rated it as a 
typical level of implementation when compared with national implementation averages. 
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Appendix B: Outcome measures for each domain 


General reading achievement 

Stanford Achievement Test, 9th Edition 
(SAT-9) 

The outcome measure was the Reading subtest from the SAT-9, administered in both spring 1998 and spring 
1999. The authors converted all assessment scores to normal curve equivalent scores (as cited in Skindrud & 
Gersten, 2006). 

Comprehension 

Comprehensive Test of Basic Skiiis, 5th 
Edition (CTBS/5) Terra Nova Reading 
Composite score 

This assessment consists of two subtests, Reading Comprehension and Vocabulary, and combines selected 
response items with constructed-response items that allow students to produce short and extended responses. 
The Reading Composite score is a simple average of the CTBS/5 Reading Comprehension and Vocabulary 
subtests described below (as cited in Borman et al., 2008). 

Reading comprehension construct 

CTBS/5 Terra Nova Reading 
Comprehension subtest 

This assessment combines selected-response items with constructed-response items that allow students to 
produce short and extended responses. The Reading Comprehension subtest items focus on five objectives: 

(a) oral comprehension of passages read aloud, (b) basic understanding of literal meanings of passages, (c) 
analyzing text, (d) evaluating and extending meaning, and (e) identifying reading strategies (as cited in Borman et 
al., 2008). 

Vocabulary development construct 

CTBS/5 Terra Nova Vocabutary subtest 

This assessment combines selected-response items with constructed-response items that allow students to 
produce short and extended responses. The Vocabulary subtest focuses on three objectives: (a) understand- 
ing word meaning, (b) identifying multi-meaning words, and (c) inferring words in context (as cited in Borman 
et al., 2008). 
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Appendix C.1: Findings inciuded in the rating for the generai reading achievement domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

Skindrud & Gersten, 2006^ 

Stanford Achievement Test, 
9th Edition (SAT-9) 

Grade 3 

12 schools/ 
434 students 

43.90 

(16.50) 

38.60 

(18.50) 

5.30 

.31 

-f12 

<.01 


Domain average for general reading achievement (Skindrud & Gersten, 2006) .31 +12 Not 

statistically 

significant 


Domain average for general reading achievement across all studies .31 +12 na 


Table Notes: For mean difference, effecf size, and improvement index vaiues reported in the tabie, a positive number favors the intervention group and a negative number favors 
the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for aii individuais who are 
given the intervention (measured in standard deviations of the outcome measure). The improvement index is an aiternate presentation of the effect size, rejecting the change in 
an average individuai’s percentiie rank that can be expected if the individuai is given the intervention. The statisticai significance of the study’s domain average was determined by 
the WWC. Some statistics may not sum as expected due to rounding, na = not appiicabie. 

® For Skindrud and Gersten (2006), the p-vaiue presented here was reported in the originai study. A correction for ciustering was needed and resulted in a WWC-computed p-value 
of .30 for the SAT-9; therefore, the WWC does not find the result to be statistically significant. The reported group means are based on an analysis of covariance (ANCOVA), which 
adjusted for pretest. This study is characterized as having a substantively important positive effect because the domain average effect size is larger than .25. For more information, 
please refer to the WWC Standards and Procedures Flandbook (version 3.0), p. 26. 


Appendix C.2: Findings inciuded in the rating for the comprehension domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

Borman et al., 2008^ 

Comprehensive Test of Basic 
Skills, 5th Edition (CTBS/5) 

Grade 

1-3 

36 classrooms/ 
679 students 

603.07 

(47.63) 

590.98 

(45.00) 

12.09 

.26 

-FlO 

nr 


Reading Composite score 

Domain average for comprehension (Borman et al., 2008) .26 +10 Not 

statistically 

significant 


Domain average for comprehension across all studies .26 +10 na 


Table Notes: For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors 
the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who 
are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change 
in an average individual’s percentile rank that can be expected if the individual is given the intervention. The WWC-computed average effect size is a simple average rounded to 
two decimal places; the average improvement index is calculated from the average effect size. The statistical significance of each study’s domain average was determined by the 
WWC. Some statistics may not sum as expected due to rounding, nr = not reported, na = not applicable. 

“ For Borman et al. (2008), a correction for clustering was needed but did not affect whether the contrast was found to be statistically significant. The WWC aggregated means and 
pooled standard deviations for grades 1-3 to align to the Beginning Reading topic area protocol. The authors presented grade-level means and standard deviations but did not report a 
p-value for the comparison of grades 1-3, and the WWC-calculated p-value for this comparison was larger than .05. The effect size in the table is based on the grade-level means and 
standard deviations in Table 3 of Borman et al. (2008). This study is characterized as having a substantively important positive effect because the domain average effect size is larger 
than .25. For more information, please refer to the WWC Standards and Procedures Flandbook (version 3.0), p. 26. 
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Appendix D.1: Description of suppiementai findings for the generai reading achievement domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

Skindrud & Gersten, 2006^ 

Stanford Achievement Test, 
9th Edition (SAT-9) 

Grade 2 

12 schools/ 
434 students 

44.30 

(17.10) 

37.20 

(16.80) 

7.10 

.42 

-f16 

<.01 

SAT-9 

Bottom 
quartile- 
grade 2 

12 schools/ 
108 students 

33.60 

(13.70) 

25.80 

(5.90) 

7.80 

.66 

-f24 

nr 

SAT-9 

Bottom 

quartiie- 

12 schools/ 
108 students 

34.60 

(13.10) 

25.40 

(14.20) 

9.20 

.68 

-f25 

nr 


grade 3 

Table Notes: The supplemental findings presented in this tabie are additionai findings from sfudies in fhis reporf that do not factor into the determination of the intervention rating. 
For mean difference, effecf size, and improvement index vaiues reported in the tabie, a positive number favors the intervention group and a negative number favors the compari- 
son group. The effect size is a standardized measure of the effect of an intervenfion on oufcomes, representing the average change expected for aii individuais who are given the 
intervention (measured in standard deviations of the outcome measure). The improvement index is an aiternate presentation of the effect size, refiecting the change in an average 
individuai’s percentiie rank that can be expected if fhe individuai is given fhe intervention. Some statistics may not sum as expected due to rounding, nr = not reported. 

“ For Skindrud and Gersten (2006), the p-vaiues presented here were reported in the originai study. Note that the authors did not conduct univariate statisticai tests for aii reporfed 
outcomes. For exampie, the two bottom quartiie reading outcomes (in grade 2 and grade 3) were jointiy significant at p < .001 . The WWC does not find the resuits to be statisticaiiy 
significant after the correction for ciusfering and muitipie comparisons adjustment were performed. A correction for ciusfering was needed and resuited in a WWC-computed p-vaiue 
of .30, .054, and .047, respectiveiy. A correcfion for muifipie comparisons was needed for the two bottom-quartiie outcomes and resuited in a WWC-computed criticai p-vaiue of .03, 
which is smaiier fhan the corresponding p-vaiue of .047 for the SAT-9 outcome in grade 3. The reported group means are ANCOVA-adjusted. The effect sizes reported here differ 
from fhose reporfed in fhe originai study due to differences in the effect-size formuias used; WWC uses Fledges’ g statistic, whiie the study appears to use the Cohen's (/statistic to 
caicuiate effect sizes. 


Appendix D.2: Description of suppiementai findings for the comprehension domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

Borman et al., 2008^ 

Comprehensive Test of Basic 
Skiiis, 5th Edition (CTBS/5), 
Reading Composite score 

Grade 1 

16 classrooms/ 
304 students 

575.79 

(37.19) 

567.81 

(42.26) 

7.98 

.20 

-f8 

nr 

CTBS/5, Terra Nova Vocabuiary 
subtest 

Grade 1 

16 classrooms/ 
304 students 

563.59 

(46.01) 

551.72 

(49.93) 

11.87 

.25 

-fIO 

nr 

CTBS/5, Terra Nova Reading 
Comprehension subtest 

Grade 1 

16 classrooms/ 
304 students 

587.46 

(36.15) 

583.35 

(42.30) 

4.11 

.10 

-f4 

nr 

CTBS/5, Reading Composite 
score 

Grade 2 

11 classrooms/ 
207 students 

610.01 

(37.50) 

599.97 

(35.10) 

10.04 

.27 

-fII 

nr 

CTBS/5, Terra Nova Vocabuiary 
subtest 

Grade 2 

11 classrooms/ 
207 students 

596.71 

(43.41) 

590.41 

(42.10) 

6.30 

.15 

-f6 

nr 

CTBS/5, Terra Nova Reading 
Comprehension subtest 

Grade 2 

11 classrooms/ 
207 students 

622.74 

(38.51) 

608.99 

(39.18) 

13.75 

.35 

-f14 

nr 

CTBS/5, Reading Composite 
score 

Grade 3 

9 classrooms/ 
168 students 

642.44 

(45.35) 

623.63 

(35.42) 

18.81 

.45 

-f18 

nr 
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CTBS/5, Terra Nova Vocabulary 
subtest 

Grade 3 

9 classrooms/ 
168 students 

633.18 

(48.64) 

616.46 

(40.25) 

16.72 

.37 

-f14 

nr 

CTBS/5, Terra Nova Reading 
Comprehension subtest 

Grade 3 

9 classrooms/ 
168 students 

651.17 

(47.14) 

630.23 

(35.44) 

20.94 

.49 

-f19 

nr 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. 
For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the compari- 
son group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are given the 
intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an average 
individual’s percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding, nr = not reported. 

“ For Borman et al. (2008), corrections for clustering and multiple comparisons were needed. The authors did not report p-values for the grade-specific contrasts; rather, they de- 
scribed the effect sizes for these contrasts. The effect size in the table is based on the grade-level means and standard deviations in Table 3 of Borman et al. (2008). WWC calculations 
show no statistically significant differences between the intervention and comparison groups for all of these outcome measures (p-values > .05). 


Open Court Reading® Updated October 2014 


Page 26 


WWC Intervention Report 


Endnotes 

■' The descriptive information for this program was obtained from a publicly available source: the program’s website (https;// 
www.mheonline.com, downloaded February 2014). The WWC requests developers review the program description sections for 
accuracy from their perspective. The program description was provided to the developer in March 2014, and the WWC incorporated 
feedback from the developer. Further verification of the accuracy of the descriptive information for this program is beyond the scope 
of this review. 

^ The literature search reflects documents publicly available by December 2013. The previous report was released In August 2008. 

This report has been updated to Include reviews of 68 studies that have been released since 2008, and 91 studies that were released 
prior to 2008 but were not included In the earlier report. Of the additional studies, 134 were not within the scope of the review proto- 
col for the Beginning Reading topic area, and 25 were within the scope of the review protocol for the Beginning Reading topic area 
but did not meet WWC group design standards. A complete list and disposition of all studies reviewed are provided in the references. 
One new study (Borman et al., 2008) meets 1/1/14/C group design standards without reservations. One study from the 2008 report 
(Skindrud & Gersten, 2006) received a revised rating in this report of meets 14/1/1/C group design standards with reservations, where it 
had previously received the rating of does not meet WWC group design standards. In the version 1 .0 standards used to review the 
2008 version of the intervention report, a statistically significant (p < .05) difference In key baseline differences was sufficient to have 
a quasi-experiment receive a rating of does not meet WWC group design standards. However, in the WWC’s version 3.0 standards, 
if baseline differences between intervention and comparison groups are between .05 and .25 standard deviations, the study can still 
meet standards after a proper statistical adjustment in the impact analysis. The studies in this report were reviewed using the Stan- 
dards from the WWC Procedures and Standards Handbook (version 3.0), along with those described in the Beginning Reading review 
protocol (version 2.1). The evidence presented in this report is based on available research. Findings and conclusions may change as 
new research becomes available. 

® For criteria used in the determination of the rating of effectiveness and extent of evidence, see the WWC Rating Criteria on p. 28. These 
improvement index numbers show the average and range of student-level Improvement Indices for all findings across the studies. 

^ Results for grades 4 and 5 (Borman et al., 2008) are reported In the WWC Adolescent Literacy Open Court Reading® Intervention 
report. 

® The study (Skindrud & Gersten, 2006) was conducted over 2 school years and analyzed two separate cohorts of students; Cohort 1 
students began in grade 2, and Cohort 2 students began in grade 3. The sample of students In Cohort 1 meets the WWC baseline 
equivalence standard and Is included In this report. The sample of students in Cohort 2 does not meet the WWC baseline equivalence 
standard and Is excluded from this report. 

® The findings considered for the effectiveness rating reflect the maximum exposure of students to the program. For example, in the 
second year of Open Court Reading® implementation, third graders (from Cohort 1) had been exposed to the program over a period of 
2 school years (when they were in the second and third grades). The corresponding intermediate findings (after 1 year of implementa- 
tion) for second graders from the same Cohort 1 are reported in Appendix D.2 and were not used in the effectiveness rating. 

Recommended Citation 

U.S. Department of Education, Institute of Education Sciences, What Works Clearinghouse. (2014, October). 
Beginning Reading intervention report: Open Court Reading®. Retrieved from http://whatworks.ed.gov 
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WWC Rating Criteria 

Criteria used to determine the rating of a study 

study rating 

Criteria 

Meets WWC group design 
standards without reservations 

A study that provides strong evidence for an intervention’s effectiveness, such as a weii-implemented RCT. 

Meets WWC group design 

A study that provides weaker evidence for an intervention's effectiveness, such as a QED or an RCT with high 

standards with reservations 

attrition that has established equivaience of the analytic samples. 

Criteria used to determine the rating of effectiveness for an intervention 

Rating of effectiveness 

Criteria 

Positive effects 

Two or more studies show statisticaiiy significant positive effects, at least one of which met WWC evidence 
standards for a strong design, AND 

No studies show statistically significant or substantively important negative effects. 

Potentially positive effects 

At least one study shows a statistically significant or substantively important positive effect, AND 

No studies show a statistically significant or substantively important negative effect AND fewer or the same number 

of studies show indeterminate effects than show statistically significant or substantively important positive effects. 

Mixed effects 

At least one study shows a statistically significant or substantively important positive effect AND at least one study 
shows a statistically significant or substantively important negative effect, but no more such studies than the number 
showing a statistically significant or substantively important positive effect, OR 

At least one study shows a statistically significant or substantively important effect AND more studies show an 
indeterminate effect than show a statistically significant or substantively important effect. 

Potentially negative effects 

One study shows a statistically significant or substantively important negative effect and no studies show 
a statistically significant or substantively important positive effect, OR 

Two or more studies show statistically significant or substantively important negative effects, at least one study 
shows a statistically significant or substantively important positive effect, and more studies show statistically 
significant or substantively important negative effects than show statistically significant or substantively important 
positive effects. 

Negative effects 

Two or more studies show statistically significant negative effects, at least one of which met WWC evidence 
standards for a strong design, AND 

No studies show statistically significant or substantively important positive effects. 

No discernible effects 

None of the studies shows a statistically significant or substantively important effect, either positive or negative. 

Criteria used to determine the extent of evidence for an intervention 

Extent of evidence 

Criteria 

Medium to large 

The domain includes more than one study, AND 
The domain includes more than one school, AND 

The domain findings are based on a total sample size of at least 350 students, OR, assuming 25 students in a class, 
a total of at least 14 classrooms across studies. 

Small 

The domain includes only one study, OR 
The domain includes only one school, OR 

The domain findings are based on a total sample size of fewer than 350 students, AND, assuming 25 students 
in a class, a total of fewer than 14 classrooms across studies. 
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Glossary of Terms 

Attrition 

Clustering adjustment 
Confounding factor 

Design 
Domain 
Effect size 

Eligibility 

Equivalence 

Extent of evidence 

Improvement index 

Multiple comparison 
adjustment 

Quasi-experimentai 
design (QED) 

Randomized controlled 
trial (RCT) 

Rating of effectiveness 

Single-case design 
Standard deviation 


Statistical significance 


Substantively important 


Attrition occurs when an outcome variable is not avaiiabie for aii participants initiaiiy assigned 
to the intervention and comparison groups. The WWC considers the total attrition rate and 
the difference in attrition rates across groups within a study. 

If intervention assignment is made at a cluster level and the analysis is conducted at the student 
level, the WWC will adjust the statistical significance to account for this mismatch, if necessary. 

A confounding factor is a component of a study that is completely aligned with one of the 
study conditions, making it impossible to separate how much of the observed effect was 
due to the intervention and how much was due to the factor. 

The design of a study is the method by which intervention and comparison groups were assigned. 
A domain is a group of closely related outcomes. 

The effect size is a measure of the magnitude of an effect. The WWC uses a standardized 
measure to facilitate comparisons across studies and outcomes. 

A study is eligible for review and inclusion in this report if it falls within the scope of the 
review protocol and uses either an experimental or matched comparison group design. 

A demonstration that the analysis sample groups are similar on observed characteristics 
defined in the review area protocol. 

An indication of how much evidence supports the findings. The criteria for the extent 
of evidence levels are given in the WWC Rating Criteria on p. 28. 

Along a percentile distribution of students, the improvement index represents the gain 
or loss of the average student due to the intervention. As the average student starts at 
the 50th percentile, the measure ranges from -50 to +50. 

When a study includes multiple outcomes or comparison groups, the WWC will adjust 
the statistical significance to account for the multiple comparisons, if necessary. 

A quasi-experimentai design (QED) is a research design in which study participants are 
assigned to intervention and comparison groups through a process that is not random. 

A randomized controlled trial (RCT) is an experiment in which eligible study participants are 
randomly assigned to intervention and comparison groups. 

The WWC rates the effects of an intervention in each domain based on the quality of the 
research design and the magnitude, statistical significance, and consistency in findings. The 
criteria for the ratings of effectiveness are given in the WWC Rating Criteria on p. 28. 

A research approach in which an outcome variable is measured repeatedly within and 
across different conditions that are defined by the presence or absence of an intervention. 

The standard deviation of a measure shows how much variation exists across observations 
in the sample. A low standard deviation indicates that the observations in the sample tend 
to be very close to the mean; a high standard deviation indicates that the observations in 
the sample tend to be spread out over a large range of values. 

Statistical significance is the probability that the difference between groups is a result of 
chance rather than a real difference between the groups. The WWC labels a finding statistically 
significant if the likelihood that the difference is due to chance is less than 5% (p < .05). 

A substantively important finding is one that has an effect size of 0.25 or greater, regardless 
of statistical significance. 


Please see the WWC Procedures and Standards Handbook (version 3.0) for additional details. 
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